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Network  Unfolding  Algorithm  and  Universal 
Spatiotemporal  Function  Approximation* ^ 

Daw- Tung  Lin  and  Judith  E.  Dayhoff 
Institute  for  Systems  Research 
University  of  Maryland 
College  Park,  MD  20742 

Abstract 

It  has  previously  been  known  that  a  feed-forward  network  with  time-delay  can  be 
unfolded  into  a  conventional  feed- forward  network  with  a  time  history  as  input.  In 
this  paper,  We  show  explicitly  how  this  unfolding  operation  can  occur,  with  a  newly 
defined  Network  Unfolding  Algorithm  ( NUA )  that  involves  creation  of  virtual  units 
and  moving  all  time  delays  to  a  preprocessing  stage  consisting  of  the  time  histories. 

The  NUA  provides  a  tool  for  analyzing  the  complexity  of  the  ATNN.  Prom  this  tool, 
we  concluded  that  the  ATNN  reduces  the  cost  of  network  complexity  by  at  least  a 
factor  of  0(n)  compared  to  an  unfolded  Backpropagation  net.  We  then  applied  the 

‘Copyright  ©1994  by  D.-T.  Lin  and  J.  E.  Dayhoff.  All  Rights  Reserved. 

tThis  work  was  supported  in  part  by  the  National  Science  Foundation  under  Grant  No.  NSF  D  CDR 
8803012  and  NSF  EEC  94-02384,  ,the  Naval  Research  Laboratory  (N00014-90K-2010),  ,  and  the  Applied 
Physics  Laboratory  of  Johns  Hopkins  University. 


1 


theorem  of  Funahashi,  Hornik  et  al  and  Stone- Weierstrass  to  state  the  general  func¬ 
tion  approximation  ability  of  the  ATNN.  We  furthermore  show  a  lemma  (Lemma  1) 
that  the  adaptation  of  time-delays  is  mathematically  equivalent  to  the  adjustment  of 
interconnections  on  a  unfolded  feed-forward  network  provided  there  are  a  large  enough 
number  ( h2nd )  of  hidden  units.  Since  this  number  of  hidden  units  is  often  impractically 
large,  we  can  conclude  that  the  TDNN  and  ATNN  are  thus  more  powerful  than  BP 
with  a  time  history. 


1  Network  Paradigm  and  Definition 

Networks  with  this  capability  can  play  an  important  role  in  applications  domains  that  have 
naturally  time-varying  properties  to  their  signals  and  dynamic  situations.  These  domains 
include  identification  and  control  as  well  as  signal  processing  and  speech  recognition. 

An  important  contribution  in  this  area  has  been  the  time-delay  neural  network  ( TDNN ) 
proposed  by  Waibel  et  al  [18],  which  employs  time-delays  on  connections  in  feedforward 
networks  and  has  been  successfully  applied  to  speech  recognition  [19,  9].  The  time-delay 
neural  network  also  classifies  spatiotemporal  patterns  and  provides  robustness  to  noise  and 
graceful  degradation  [14].  Techniques  such  as  backpropagation  through  time  [20]  have  been 
applied  to  temporal  pattern  recognition  but  not  with  adaptive  time-delays.  Time  delays  are 
fixed  initially  and  remain  the  same  throughout  training.  The  decision  of  how  many  time 
delays  are  needed  and  what  are  the  most  appropriate  lengths  for  each  time  delay  are  often 
made  by  trial-and-error.  As  a  result,  the  system  may  have  poor  performance  due  to  the 
inflexibility  of  time  delays  and  due  to  a  mis-match  between  the  choice  of  time  delay  values 
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and  the  temporal  location  of  the  important  information  in  the  input  patterns.  In  addition, 
the  system  performance  may  vary  depending  on  the  range  of  the  time  delay  values. 

To  overcome  this  limitation,  we  have  used  an  Adaptive  Time  Delay  Neural  Network 
model  [6,  14,  13,  7,  15].  This  network  adapts  its  time  delay  values  as  well  as  its  weights 
during  training,  to  better  accommodate  to  changing  temporal  patterns,  and  to  provide  more 
flexibility  for  optimization  tasks.  The  ATNN  used  here  allows  arbitrary  placement  of  time 
delays  along  interconnections  and  adapts  those  time  delays  independently  of  one  another. 
Furthermore,  time-windows  are  not  used  as  in  previous  work  [2,  18]  but  instead  classifi¬ 
cation  relies  on  a  set  of  individual  time  delay  values  associated  with  each  interconnection. 
Although  other  artificial  neural  network  architectures  that  make  use  of  time  delays  have  been 
suggested  [17,  5,  7,  2,  3],  these  paradigms  employ  different  training  rules  or  different  network 
topologies,  and  the  network  presented  here  is  simple  to  use  and  has  a  general  formulation. 

The  proposed  ATNN  model  employs  modifiable  time  delays  along  the  interconnections 
between  two  processing  units,  and  both  time  delays  and  weights  are  adjusted  according  to 
system  dynamics  in  an  attempt  to  achieve  the  desired  optimization.  Node  i  of  layer  h  —  1 
is  connected  to  node  j  of  the  next  layer  h,  with  the  connection  line  having  an  independent 
time-delay  Tjik,h- i  and  synaptic  weight  Wjik,h- 1-  To  illustrate,  a  three  layered  ATNN  is  shown 
in  Figure  1. 

Next  we  propose  definitions  that  are  used  to  describe  a  general  A  TNN  architecture  with 
flexible  configuration. 

Definition  1  riji  is  the  number  of  time-delays  employed  on  the  connections  to  node  j  from 
node  i  (i.e.,  the  number  of  samples  in  the  time  frame  Tji). 
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input  layer 
set  Mi 
i  G  A/*i  ,p  = 


hidden  layer 
set  A4 

j  G  A/2 ,q  —  |N| 


output  layer 
set  A/3 

k  G  A/3,  r  =  UA/3II 


Figure  1:  Three  layered  ATNN 


Definition  2  Time  frame:  Time  frame  Tji  is  a  set  of  time  delays  fan,  ...,Tjinji)  employed 
on  the  connections  to  node  j  from  node  i.  The  time  frame  can  have  an  additional  index  h 
(Tji,h)  to  signify  layer  h  of  time-delay. 


Definition  3  The  set  of  time  frames  Ti  for  connections  that  originate  at  node  i  is  defined 
as: 


%  —  ( Tu ,  •  •  • ,  Tji,  •  •  • ,  Tqi), 


(1) 


where  q  is  the  number  of  units  that  receive  connections  from  node  i. 


Definition  4  Pji(t,  Tji )  is  the  set  of  signal  values  transmitted  to  node  j  from  node  i  via  time 
frame  Tji  at  time  t.  Thus,  Pji(t,  Tji)  —  [sj(t  —  Tjn), Si(t  —  Tjinji)\,  where  Si(t)  is  the  signal 
from  node  i  at  time  t. 
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Definition  5  Pi(t,Ti)  is  the  set  of  signal  values  transmitted  from  node  i  to  other  nodes  at 
time  t.  Thus, 


Pn(t,Tu) 

[Si(t  7"lil))  ■■ 

■i  S>i(t  ^"lmi,)] 

Pi(t,Ti)  = 

Pjiifi  Tji) 

= 

^Sj(t  Tjii),  .. 

.,  Si(t  —  7"jmJt)] 

^  Pqiifi  Tqi)  } 

.  [si(^  —  Tqil) >  •• 

•)  si(t  —  r<j»n,t)]  j 

Let  Jith  be  a  set  of  nodes  that  receive  connections  from  node  i  on  layer  h.  Node  i  transmits 
the  set  of  pattern  values  Pi(t,Ti)  to  selected  subset  Ji>h.  In  other  words,  Vj  G  node  j 
receives  pattern  Pi(t,Tji)  from  node  i  through  a  time  frame  Tji.  The  samples  in  the  time 
frame  Tji  correspond  to  the  delays  t,-^,  where  x  =  1, 2,  •  •  • ,  nji,  where  nji  =  ||7Jj||,  j  G  J^, 
and  h  is  the  index  of  the  layer  in  which  i  belongs.  In  the  definitions  above,  we  have  omitted 
the  layer  index  h  for  simplicity. 

Definition  6  J\fh  is  the  set  of  nodes  {1,  2, ...,  |A4|}  of  layer  h 

Thus,  each  node  j  of  layer  h  (eg.  j  G  A/*),  receives  nji  inputs  from  each  i  G  Tj,h->  where  Xjth 
is  the  subset  of  nodes  on  layer  h  —  1  (ljth  Cj  Nh-i)  that  connects  to  unit  j  on  layer  h.  The 
total  number  of  inputs  to  j  is:  m,j  =  IZieij  h  nji- 

In  this  model,  we  assume  that  p  =  ||A/i||,<7  =  ||A/^|,r  =  |[A fs\\  are  fixed,  and  that  the 
connectivity  between  layers  is  fixed  (||Ji,/i-i||  for  the  same  layer  is  fixed),  and  for  the  sake 
of  simplicity,  we  assume  ||J;;i||  =  |(A/^||  and  \\Ji,^\  =  \\hf^\  (fully  connected).  In  general,  the 
number  of  samples  in  the  time  frame,  nji,  and  the  values  of  the  delays  (therefore  the  set  Tji) 
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are  variables.  In  this  research,  we  assume  that  is  fixed  (selected)  and  allow  Tjix>h_i  to  be 
variable.  We  also  assume  that  is  the  same  for  all  i  E  lhh  and  for  all  j  E  Jith~ i-  If  nJt  is 
the  same  (fixed)  for  all  i  E  Xjth,  then  nij  =  riji  •  \\Ljj\\. 

Definition  7  rih- 1  is  defined  as  the  number  of  time-delays  on  connections  originating  at 
node  i  on  layer  h  —  1  where  i  =  nji  for  all  j. 

Then  our  model  possesses  the  following  property: 

Property  1  For  each  node  i  E  Afh-i  connecting  to  all  j  E  Ji,h- 1  U  A fh,  there  is  a  time 
frame  Tji  with  nh-i  samples,  corresponding  to  the  delays  TjiXih-i,x  =  1,  •  •  •  ,nh^x.  Inputs  to 
node  j  (on  layer  h)  are  from  node  i  (on  layer  h—1),  and  there  are  n^-x  •  ||A/J,_i||  such  inputs. 

2  NUA  and  Complexity  Analysis 

From  previous  literature,  researchers  have  indicated  that  a  time-delay  embedded  network 
can  be  unfolded  into  feedforward  network,  but  left  the  procedure  of  unfolding  vague.  In  this 
section  we  elucidate  this  procedure  in  a  very  explicit  four-step  operation:  Network  Unfolding 
Algorithm  (NUA). 

For  notational  simplicity,  we  use  a  three  layer  ATNN  to  illustrate  the  unfolding  operation 
NUA  and  to  keep  the  number  of  time-delays  in  the  same  layer  to  be  consistent.  We  use 
definitions  of  the  network  configuration  in  previous  section.  Use  Definition  6  and  let  n/  = 
Wi\,nH  —  \fifjj | ,  no  =  \Mo\  be  the  number  of  nodes  on  the  input,  hidden  and  output  layers 
respectively.  Use  Definition  7  and  let  nx  =  du  n2  -  d2  be  the  number  of  delays  on  connections 
originating  at  nodes  on  the  input  and  hidden  layers  respectively.  From  Property  1,  there  are 
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d\  •  nj  and  d 2  •  uh  inputs  to  each  node  on  the  hidden  layer  and  the  output  layer  respectively. 
Knowing  the  number  of  connections,  we  can  unfold  the  ATNN  by  the  following  algorithm: 

Algorithm  1  (Network  Unfolding  Algorithm  (NUA))  : 

Step  1  -  unfold  inputs: 

For  each  hidden  node  j  do  in  parallel: 

For  each  input  node  i  do  in  parallel: 

1.1  duplicate  (d\  —  1)  new  input  nodes  and  spread  these  nodes  horizontally  next  to 
each  original  input  node 

1.2  move  (d\  —  1)  of  the  original  connections  to  new  input  nodes  correspondingly 
and  retain  the  weight  and  time- delay  on  each  connection 

end 

end 

Step  2  -  re-adjust  input  time  lag. 

For  each  hidden  node  j  do  in  parallel: 

2.1  remove  the  time-delays  between  input  and  hidden  units 

2.2  set  the  input  values  as  [pji(t,  Tji),  ...,pjni(t,  Tjni )]  accordingly,  where  Tji  is  the  time 
frame  (in  Definition  2)  on  the  connections  from  node  i  to  j,  pji  is  the  signal  value 
vector  transmitted  from  node  i  to  j  (Definition  4) 

end 

Step  3  -  unfold  hidden  nodes: 

For  each  output  node  k  do  in  parallel: 

For  each  hidden  node  j  do  in  parallel: 
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3.1  duplicate  fa  —  l)  new  input  nodes  and  spread  these  nodes  horizontally  next  to 
each  original  hidden  node 

3.2  move  (d?  —  1)  of  the  original  connections  to  new  hidden  nodes  correspondingly 
and  retain  the  weight  and  time-delay  on  each  connection  to  output  node  k 

3.3  for  each  newly  created  hidden  node  do  in  parallel: 

copy  the  whole  branch  which  associates  with  the  original  hidden  node  in 
Step  2,  and  then  connect  to  that  new  node  as  its  branch  and  retain  the  weights 

end 

end 

end 

Step  4  -  re-adjust  input  time  lag: 

For  each  each  hidden  node  j  and  its  newly  created  node  from  Step  3  do  in  parallel: 

4-1  remove  the  associated  time-delays  between  hidden  and  output  units 

4-2  re-adjust  the  input  node  time  lag,  such  that  the  input  node  of  each  branch  takes  the 
vectors  of  \pj\{t  —  'T’kj.ii 7ji),  ...,pjni(t  —  Tkj, i,  \pji(t  ’HsjVfej  Tji),  ...,pjni(t  — 

Tktj,d2,Tjn,)\  correspondingly 

end  □ 

Let  ni,  nn.no  be  the  number  of  units  in  input,  hidden  and  output  layer  respectively. 
Let  di  and  o?2  be  the  number  of  time-delay  connections  between  input-hidden  node  pairs 
and  hidden-output  node  pairs  respectively.  We  present  a  table  (Table  1)  which  shows  how 
the  number  of  input  and  hidden  units  in  a  feedforward  network  or  BP  (with  zero  weight  on 
connections  to  other  nodes)  are  increased  exponentially  from  unfolding.  If  the  number  of 
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nodes  in  each  layer  is  n,  then  the  number  will  be  increased  by  a  factor  of  0(n)  and  the  input 
nodes  is  increasing  by  a  factor  of  0(n2).  Therefore  the  ATNN  is  indeed  a  very  economic 
architecture  to  achieve  the  same  goal  as  feedforward  network.  For  a  network  with  more  than . 
two  hidden  layers,  the  unfold  operation  can  be  executed  by  repeating  Step  3  and  Step  4 
from  the  lowest  hidden  layer  to  the  highest  layer,  and  expanding  the  net  until  reaching  the 
output  layer. 

Figure  2  explicitly  and  graphically  elucidates  the  unfold  operation  of  an  ATNN  with  net 
configuration  2  — >  2  — >  1  (labeled  as  original).  In  Step  1,  two  new  nodes  (2  x  2  —  2  =  2) 
are  created  (drawn  in  dotted  circle)  which  are  labeled  as  l[  and  l'2,  then  spread  these  two 
newly  created  nodes  next  to  its  original  node.  In  Step  2,  each  associated  time-delay  is 
removed  by  adjusting  the  time  lag  of  each  input  node  so  that  each  node  receiving  information 
at  correct  time  slot  (t  —  rx,t  —  T^,t  —  r3,  and  t  —  r 4).  In  Step  3,  since  there  are  two  synapse 
connections  from  node  Hx  to  output  node  0\,  we  need  to  create  another  hidden  node  (labeled 
in  H[)  and  then  copy  the  whole  branch  of  node  Hx  and  hook  up  to  node  H1.  In  Step  4,  the 
time-delays  75  and  r6  are  removed  from  connections  between  hidden  node  and  output  node. 
These  time  lag  factors  are  pushed  down  to  the  input  layer,  so  that  each  branch  inherits  its 
value  from  the  upper  level.  In  other  words,  the  input  vector  of  the  branch  under  node  Hx 
becomes  [xx(t  —  tx  —  r5),  xx(t  —  r2  —  r5),  x2(t  —  r3  —  T5),x2(t  —  t4  —  t5)].  The  input  vector  of  the 
other  branch  (under  node  H[)  is  [cci (t  —  tx  —  r6), xx ( t  —  r2  —  r6), x2{t  —  r3  —  t6), x2(t  —  r4  —  r6)]. 

The  NUA  can  also  be  executed  in  an  alternative  order  by  unfolding  the  second  layer 
first  using  the  same  procedure.  This  is  illustrated  graphically  in  Figure  2.  In  Step  1  and 
Step  2,  the  hidden  layer  is  unfolded  first  by  creating  a  virtual  node  Hx  and  duplicating  the 
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X\{t  -  T\) 
Xi{t~T2) 
x2(t  -  t3) 
x2{t  -  r4) 


x\  (t-r i  -  r5) 

Xi(t-T2-r5) 
x2(t  -  r3  -  r5) 
x2(t  -  t4  -  t5) 

*i(<-n  -r6) 

Xl(t~T2-T6) 

x2(t  -  r3  -  r6) 
i2(i-T4  -r6) 


Step  4 


Figure  2:  Explicit  demonstration  of  unfold  operation. 


original  net 

unfolded  net 

#  of  output  nodes 

n0 

0(n) 

n0 

0(n) 

#  of  connections  b/t  each  H/O  node  pair 

d2 

0(1) 

1 

0(1) 

#  of  hidden  nodes 

nH 

0(n) 

njfd2no 

0(n 2) 

#  of  connections  b/t  each  I/H  node  pair 

d\ 

0(1) 

1 

0(1) 

#  of  input  nodes 

ni 

0(n) 

njdinnd2no 

0(na) 

#  of  connections  to  each  output  node 

nnd2 

#  of  connections  to  each  hidden  node 

nidi 

0(n) 

total  #  of  connections  to  output  layer 

nHd2n0 

0(n'z) 

niid2no 

0(n2) 

total  #  of  connections  to  hidden  layer 

nidiUH 

~o(fNY 

nidinnd2no 

0(n3) 

Table  1:  Comparison  of  the  original  and  unfolded  network  configuration, 
entire  branch  of  the  original  node  H\.  The  input  time  lags  of  each  branch  are  adjusted  to 
corresponding  time-delay  r5  and  t6.  In  Step  3,  the  new  input  nodes  are  generated  and  the 
entire  input  layer  is  unfolded.  Then  finally  in  Step  4,  each  input  time  lags  is  readjusted. 


3  Hidden  Units  and  Single  Layer  Time-Delay  Adap¬ 


tation 


Property  2  The  TDNN  is  equivalent  to  its  unfolded  feedforward  network  with  sufficient 
number  of  hidden  and  input  units.  However,  it  is  usually  impractical  to  have  such  large 
number  of  hidden  nodes  and  input  nodes  in  an  unfolded  configuration.  An  example  of  a 
three  layered  network  is  shown  in  Table  1,  the  total  number  of  hidden  units  in  the  unfolded 
network  may  grow  up  to  0(n2),  while  the  total  number  of  input  nodes  may  grow  up  to 
0(n3). 

Lemma  1  (single  layer  time-delay  adaptation)  Given  a  TDNN  with  ATNN  algorithm 
applied  to  adapt  time-delays  on  only  one  layer  (layer  one  or  layer  two)  of  connections  and 
adapt  weights  on  both  layers.  Then,  after  each  iteration  of  training,  the  time-delay  changes 
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original 


Step  1 


X\(t~Tb) 
Xi(t  -  r6) 
X2(t  -  Tg) 
X2{t~T5) 

X\ (t  -  T6) 
Xi(t  -  T6) 

z2(i  -  r6) 
*2(t-r6) 


Step  3 


Xi  (t-n-  r5) 
X\ (t-r2-  r5) 
X2(t-T3  -Tg) 

z2(t-  r4  -r5) 

Zifi  -  ti  -  r6) 
a;i(<  ~  t2  -  r6) 
x2(t  -  r3  -  t6) 
z2(t  -  r4  -  r6) 


Step  4 


Figure  3:  Alternative  NUA  by  unfolding  second  layer  time-delay  first. 
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of  the  TDNN  can  he  realized  in  a  feedforward  network  with  a  sufficiently  large  number  of 
hidden  units  h2nd,  where  h  and  n  are  the  number  of  hidden  and  input  units  of  the  ATNN 
respectively,  d  denotes  the  sum  of  the  maximum  time-delay  variables  in  layer  one  and  layer 
two  of  TDNN  ( i.e d  =  max{Tf)  +  max(T2),  where  Ti  and  T2  are  time-delay  matrices  of 
layer  one  and  two  respectively). 

Proof:  Given  n  input  units  and  sufficient  number  of  historical  time  step  d ,  each  hidden  node 
is  associated  with  n  •  d  inputs  (Property  1).  For  all  n  ■  d  elements,  there  are  at  most  2nd  of 
combinations  (denoted  as  set  A)  in  terms  of  whether  each  element  is  selected  or  unselected. 
Since  d  is  sufficiently  large,  all  the  changes  in  any  time-delay  elements  will  fall  in  the  range 

M- 

Given  an  feedforward  net  with  h2nd  hidden  units  in  which  each  hidden  unit  of  the  original 
ATNN  is  expanded  to  2nd  units,  such  that  each  of  the  2nd  units  is  associated  with  one  of  the 
possible  combinations  (in  set  A)  of  a  connection  configuration  to  the  historical  time  steps  of 
all  input  units. 

Applying  NUA  operation,  the  ATNN  can  be  unfolded  into  a  feedforward  net  with  the 
input  node  taking  data  a^{t  flayer— one  flayer — two )  ont  of  input  node  i ,  wheie  flayer— one  and 
Tayer-two  represent  the  time-delay  in  layer  one  and  two  respectively.  For  example,  Tiayer-one 
is  one  of  T\,  t2,  r3,  or  r4  in  Figure  2,  fiayer-two  is  r5  or  r6. 

If  Tiayer-two  is  fixed,  changing  layer  one  of  time-delays  is  equivalent  to  making  selection 
of  a  input  set  ai(t  —  Tiayer-one)  with  a  fixed  amount  of  shift  Tiayer-two ■  The  input  data  set  is 
a  subset  of  all  input  data  combinations  set  A. 

If  flayer-one  is  fixed,  changing  layer  two  of  time-delays  is  equivalent  to  shifting  all  input 
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data  ai(t  —  r/aj/er_one)  in  the  same  time  step  Tiayer-two .  The  shifted  data  set  is  also  a  subset 
of  set  A. 

Therefore,  given  h2nd  hidden  units  will  be  sufficient  to  represent  all  different  combinations 
of  connection  configurations  to  the  historical  record  of  all  input  nodes.  With  layer  one  or 
layer  two  of  time-delay  adapted,  the  ATNN  is  equivalent  to  an  unfolded  feedforward  net 
with  extra  and  sufficient  enough  number  of  hidden  units  h2nd.  □ 

Given  the  fact  of  Theorem  1  and  Corollary  1,  we  need  to  consider  the  optimal  weights 
Cj  and  time-delays  where  the  ATNN  is  applied.  Given  sufficient  number  of  hidden 
units  and  time  delays,  ATNN  with  one  hidden  layer  is  capable  of  approximating  any  spatio- 
temporal  function  to  any  desired  degree  of  accuracy.  Practically,  for  a  small  number  of 
hidden  units,  then  we  need  to  adapt  delays  in  both  layers  in  order  to  achieve  better  mapping 
or  performance. 

Remark  1  Given  a  lower  bound  number  of  hidden  units,  with  a  proper  choice  of  weights 
and  time-delays,  the  ATNN  can  approximate  any  complex  spatiotemporal  function  to  desired 
accuracy.  However,  it  can  get  stuck  in  a  local  minimum.  The  convergence  is  expected  to 
improve  if  additional  hidden  units  are  employed  [10,  12]. 

4  ATNN:  Universal  Spatiotemporal  Function  Approx¬ 
imator 

The  capability  of  multilayer  feedforward  networks  has  been  theoretically  studied.  In  previous 
work,  Funahashi,  Hornik,  Stinchcombe  and  White  have  concluded  that  “standard  multilayer 
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feedforward  networks  with  as  few  as  one  hidden  layer  using  arbitrary  squashing  functions  are 
capable  of  approximating  any  Borel  measurable  function  from  one  finite  dimensional  space 
to  another  to  any  desired  degree  of  accuracy,  provided  sufficiently  many  hidden  units  are 
available.  In  this  sense,  multilayer  feedforward  networks  are  a  class  of  universal  approxima¬ 
tors”  [11].  Similar  theorems  can  also  be  seen  in  [8,  16,  4].  The  superpositions  of  a  sigmoidal 
function  in  achieving  universal  approximation  is  also  discussed  [1]. 

In  this  section,  we  rephrase  this  mathematical  statement  and  then  extend  these  results 
to  networks  with  time-delays. 

Let  r  €  fif  and  Ar  be  the  set  of  all  affine  functions  from  72r  to  72:  A(x )  =  w  -x  +  b  where 
w  and  x  are  vectors  in  72r ,  and  b  G  72  is  a  scalar.  Let  be  a  measurable  function  from 
72.  to  7 2  and  let  Er(</>)  denote  the  class  of  functions  {f(x)  =  T,qi=lCif){Ai{x))}  from  72r  to  72, 
where  x  G  72r,  c*  G  72,  A{  G  AT,  q  G  M.  We  adopt  Funahashi’s  main  results  as  below: 

Theorem  1  (adopted  and  restated  from  Funahashi  [8])  Let  {aq,  be  a  set  of 

distinct  points  of  compact  set  K  in  Euclidean  space  72"  and  let  f  :  72n  -4  72  be  an  arbi¬ 
trary  real  valued  function  on  K .  If  (j)  is  a  bounded  and  monotone  increasing  differentiable 
function,  then  for  any  e  >  0,  there  exists  an  integer  N  and  real  constants  C{,  9i,  and  Wij 
(i  €  {1,  N},j  e  {1  ,-.,n})  such  that  f(xu...xn)  =  Y,f=lcl<j){Ytnj=lwljXj  -  Of)  ofY,N((f)) 
satisfies  \/x  G  K,  \f(x)  —  f(x) |  <  e.  That  is,  with  N  hidden  units,  a  three  layered  network  of 
class  E  ((f>)  can  approximate  any  function  to  a  desired  accuracy. 

Proof  of  Theorem  1  can  be  referred  to  Funahashi  (1989)  [8]  and  Hornik,  Stinchcombe  and 
White  (1989)  [11]. 

Similar  theorems  can  be  applied  to  the  TDNN  and  ATNN  to  approximate  a  spatio- 
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temporal  function  with  the  extra  degrees  of  freedom  in  the  time-delay  domain,  considering 
the  network  is  unfolded.  Because  the  Stone- Weierstrass  theorem  played  a  central  role  in  the 
proof  of  Theorem  1,  we  state  it  here  for  reference  and  later  use.  First  we  give  a  few  necessary 
definitions  before  we  introduce  the  Stone- Weierstrass  theorem. 

Definition  8  A  family  A  of  real  functions  defined  on  a  compact  set  K  is  an  algebra  if  A  is 
closed  under  addition,  multiplication  and  scalar  multiplication. 

Definition  9  A  set  A  separates  points  on  compact  set  K  if  for  every  x,y,x  A  V  m  AT,  there 
exists  a  function  f  in  A  such  that  f(x)  ^  f(y)- 

Definition  10  A  set  A  vanishes  at  no  point  of  K  if  for  each  x  in  K  there  exists  f  in  A  such 
that  f(x)  A  0. 

Theorem  2  (Stone- Weierstrass)  Let  A  be  an  algebra  of  real  continuous  functions  on  a 
compact  set  K.  If  A  separates  points  on  K  and  if  A  vanishes  at  no  point  of  K,  then  A 
consists  of  all  real  continuous  functions  on  K. 

Let  X]_ [t\,X2[t\,  ...,xn[t]  be  n  input  signal  channels  measured/observed  in  time  interval 
[to,T]  in  a  discrete  manner.  For  fixed  Tj $  >  0,j,  k  €  M  and  t  —  Tj ^  in  interval  [to,?"],  we 
let  {<  xi(t  -  r^i),  ...,xi(t  -  rM)  >, ...,  <  xn(t  -  Tn> i),  ...,xn(t  -  r„)d)  >}  be  a  set  of  distinct 
points  of  compact  set  Kd  in  Euclidean  space  7 Znd.  We  extend  Theorem  1  and  obtain  the 
following  corollary: 

Corollary  1  (original  contribution)  Let  f  :  7 Znd  —¥  TZ  be  an  arbitrary  real  valued  func¬ 
tion  on  Kd.  If  (f  is  a  bounded  and  monotone  increasing  differentiable  function,  then  for 
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any  e  >  0,  there  exists  integer  N,  and  real  constants  Ci,  and  Wi%j,k  (i  G  {1  G 

{1, n} ),  such  that 


f(x)  —  T,i=1Ci(f)(Yij=1Y,k=1Wij:kXj(t  @i) 

satisfies  \/x  G  Kd,  \ f{x)  —  f(x)\  <  e.  In  other  words,  with  at  least  one  hidden  layer  of  N  hid¬ 
den  units  and  d  time-delays  elements  in  each  input-hidden  connections  pairs  ,  a  three  layered 
TDNN  network  (with  delays  employed  on  the  first  layer)  can  approximate  any  spatiotemporal 
function  to  a  desired  accuracy. 

Proof:  We  apply  the  Stone- Weierstrass  Theorem  and  use  the  notations  above.  Given  </>(•)  : 
7 Znd  — >  TZ  and  let  A  is  the  set  of  all  affine  functions  from  7 Znd  to  1Z  such  that  A(x)  = 
w-x  +  b,x  G  Kd,  i.e.  x  =  {<  xn(t-Tn>i), xn(t-Tn4)  >}. 

We  denote  £nd(0)  as  the  class  of  functions  h(x)  :  {Eci(j)(A(x))} .  We  need  to  show  End(</>) 
consists  of  all  real  functions  on  Kd.  First  we  show  fi(A)  is  separating  on  Kd  that  ensures 
E nd((j))  is  separating  on  Kd.  Let  Kd  C  7 Znd  be  a  compact  set,  so  for  <f  from  TZnd  to  7 Z,  E (</>)  is 
an  algebra  on  Kd.  Pick  a,b  G  R,  a  ^  b  such  that  fi(a)  ^  fi(b).  Pick  ^4(-)  such  that  A(x)  =  a 
and  A(y)  =  b,  x,y  G  Kd,  then  we  obtain  <j>(A(x))  A  Therefore  is  separating  on 

Kd. 

Secondly,  we  need  to  show  4>{A)  vanishes  at  no  point  on  Kd.  Pick  b  G  TZ  such  that 
cj)(b)  0  and  set  A(x)  =  0  •  x  +  b.  For  all  x  G  Kd,  4>(A(X))  =  (j)(b),  the  result  follows.  □ 
The  proof  is  mathematically  equivalent  to  that  of  Hornik,  Stinchcombe  and  White’s 
theorem  of  universal  approximation  with  multilayer  feedforward  networks.  We  extend  from  n 
dimensional  input  space  to  nd  dimensional  space.  In  other  words,  each  distinct  input  point  Xi 
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in  n  dimensional  input  space  is  expanded  into  a  distinct  points  set  <  Xi(t— Ti), Xi(t—Td)  > 
in  nd  dimensional  input  space. 

5  Conclusion 

The  ATNN  provides  more  dynamics  and  flexibility  for  the  network  itself  to  approach  an 
efficient  performance  level  and  to  optimize  its  configuration.  A  network  unfolding  algorithm 
(NU A)  has  been  explicitly  defined  to  formulate  the  unfolding  operation  from  TDNN  or 
ATNN  to  feedforward  network.  This  procedure  provides  ways  to  analyze  the  complexity  of 
ATNN.  The  NUA  algorithm  also  provides  a  guide  line  for  hardware  conversion  from  con¬ 
ventional  feed-forward  network  without  redesign  overhead.  Extended  corollary  and  lemmas 
are  proposed  such  that  the  ATNN  inherits  the  properties  and  capabilities  of  feedforward 
network  as  a  universal  function  approximator. 
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